Analysis of the algorithm: From kernels to backup genes.

Kernelization section

The algorithm transformed the semantic similarity matrix to make it compatible with a kernel. Once this was done for each network and kernel type, it was integrated by kernel type. Below there is a general analysis of the properties of each matrix in the different phases of the process.

Annotations properties

Table 1. Annotation files descriptors

Net Min Max Average Standard_Deviation
biological_process 1 134 6.999882332176266 11.432654770656663
cellular_component 1 40 4.162222345933308 5.25157343549579
disease 1 21 2.2250479846449136 2.909050012799259
interaction 1.0 729.0 29.8386134923593 54.03869234947398
molecular_function 1 26 3.0287856936832998 3.7159142158779024
pathway 1.0 191.0 4.003825833485152 8.704590940604282
phenotype 1 335 31.553476462477843 46.99329427183839

Matrix properties

Table 2. Similarity matrixes

Net Matrix_Dimensions Matrix_Elements Matrix_Elements_Non_Zero
biological_process_sim 16997x16997 288898009 262330240
cellular_component_sim 17963x17963 322669369 322651406
disease_sim 4168x4168 17372224 16578192
genetic_interaction_sim 17354x17354 301161316 301143962
interaction_sim 16098x16098 259145604 479348
molecular_function_sim 17335x17335 300502225 300484890
pathway_sim 3828x3828 14653584 159182
phenotype_sim 5077x5077 25775929 25770852
protein_interaction_sim 16098x16098 259145604 479348

Table 3. Filtered similarity matrixes

Table 4. Uncombined kernel matrixes

Net Kernel Matrix_Dimensions Matrix_Elements Matrix_Elements_Non_Zero
biological_process ct 15226x15226 231831076 231831076
biological_process el 15226x15226 231831076 231831076
biological_process ka 15226x15226 231831076 206175644
biological_process rf 15226x15226 231831076 231831076
cellular_component ct 476x476 226576 226576
cellular_component el 476x476 226576 226576
cellular_component ka 476x476 226576 226576
cellular_component rf 476x476 226576 226576
disease ct 3811x3811 14523721 14523721
disease el 3811x3811 14523721 14523721
disease ka 3811x3811 14523721 13827029
disease rf 3811x3811 14523721 14523721
molecular_function ct 2838x2838 8054244 8054244
molecular_function el 2838x2838 8054244 8054244
molecular_function ka 2838x2838 8054244 8054244
molecular_function rf 2838x2838 8054244 8054244
pathway ct 3419x3419 11689561 11675893
pathway el 3419x3419 11689561 8582445
pathway ka 3419x3419 11689561 162521
pathway rf 3419x3419 11689561 8582445
phenotype ct 3344x3344 11182336 11182336
phenotype el 3344x3344 11182336 11182336
phenotype ka 3344x3344 11182336 11182336
phenotype rf 3344x3344 11182336 11182336
protein_interaction ct 16098x16098 259145604 259081193
protein_interaction el 16098x16098 259145604 252047984
protein_interaction ka 16098x16098 259145604 495446
protein_interaction rf 16098x16098 259145604 252047984

Table 5. Integrated kernel matrixes

Integration Kernel Matrix_Dimensions Matrix_Elements Matrix_Elements_Non_Zero
integration_mean_by_presence ct 18802x18802 353515204 327774838
integration_mean_by_presence el 18802x18802 353515204 323440310
integration_mean_by_presence ka 18802x18802 353515204 215990928
integration_mean_by_presence rf 18802x18802 353515204 323440310
mean ct 18802x18802 353515204 327774838
mean el 18802x18802 353515204 323440310
mean ka 18802x18802 353515204 215990928
mean rf 18802x18802 353515204 323440310

Weight values